Efficient Exploration in Resource-Restricted Reinforcement Learning
نویسندگان
چکیده
In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types resources that are non-replenishable in each episode. Typical include robotic control with limited energy and video games consumable items. tasks resources, we observe popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust fast thus the subsequent exploration severely restricted due absence resources. To address this challenge, first formalize aforementioned problem a resource-restricted learning, then propose novel resource-aware bonus (RAEB) make reasonable usage An appealing feature RAEB it can significantly reduce unnecessary resource-consuming trials while effectively encouraging agent explore unvisited states. Experiments demonstrate proposed outperforms state-of-the-art strategies environments, improving efficiency by up an order magnitude.
منابع مشابه
Efficient Exploration in Reinforcement Learning
An agent acting in a world makes observations, takes actions, and receives rewards for the actions taken. Given a history of such interactions, the agent must make the next choice of action so as to maximize the long term sum of rewards. To do this well, an agent may take suboptimal actions which allow it to gather the information necessary to later take optimal or near-optimal actions with res...
متن کاملEfficient Exploration for Reinforcement Learning
Reinforcement learning is often regarded as one of the hardest problems in machine learning. Algorithms for solving these problems often require copious resources in comparison to other problems, and will often fail for no obvious reason. This report surveys a set of algorithms for various reinforcement learning problems that are known to terminate with good solution after a number of interacti...
متن کاملResource Constrained Exploration in Reinforcement Learning
This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid ...
متن کاملLearning to soar: Resource-constrained exploration in reinforcement learning
This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resourcelimited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(l), uses a Gaussian process regression model to estimate the value function in a reinforcem...
متن کاملEfficient Reinforcement Learning via Initial Pure Exploration
In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing he...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i8.26224